Nazli Bilgic (nazbi056) was responsible for coding and writing analysis for assignment one.
Siddhesh Sreedor (sidsr770) was responsible for coding and writing analysis for assignment two.
We split the work and after completion we collaborated together to do and understand the other persons work.
For year 2004 blue dots(Aedes albopictus); are more concentrated in United states. Memphis and Jackson regions are the places where the points are more.
For year 2013 for the same regions we dont see any blue dots on these regions. There are Aedes albopictus mostly seen in Italy and Taiwan for this year.
For year 2004 red dots: in 2004 we can see more red dots gathered in Brazil and Venezuela.
For year 2013 blue dots: we see increase in the number of red dots in Brazil.
There are numerous points which are very close to each other and its causing overplotting problem. In areas like this it is hard to tell if there are many points or few points. It can be misleading for people who are analyzing the point distribution on the map. It can cause wrong estimations.
The color scale is designed to cover the whole range equally. Color scale spans wide range of values. For ‘CAN’ mosquito count is 1 and for ‘BRA’ count is 8501. Because the numebrs are apart from each other we can notice the mosquito amount difference for these countries from the colors on the map.
For the smaller count differences (like 1 to 10). Even tough there are differences between the values numerically. It is difficult to actually see the difference between countries with low count values(colors are very similar) by looking to the map.
High number of mosquito countries are colored darker. Brazil has the most mosquitos and then from the map we can see that USA also has large number of mosquitos.
Regions close to poles are noticeably enlarged compared to their true size.
Areas near the edges don’t show clearly(distorted) we can see clearly when we zoom in. We can clearly comment/understand which countries has the most mosquito counts by looking to the color scale. (USA and Brazil has darker color)
Identify regions in Brazil that are most infected by mosquitoes. Did such discretization help in analyzing the distribution of mosquitoes?
## # A tibble: 1,955 × 5
## # Groups: X1 [90]
## X1 Y1 mean_X mean_Y N
## <fct> <fct> <dbl> <dbl> <int>
## 1 [-72.8,-72.4] (-8.21,-7.84] -72.8 -7.96 1
## 2 (-71.2,-70.8] (-8.94,-8.57] -70.8 -8.9 1
## 3 (-70.4,-70] (-10.8,-10.4] -70.0 -10.7 1
## 4 (-70,-69.6] (-4.14,-3.77] -69.7 -4.03 1
## 5 (-69.6,-69.2] (-10.8,-10.4] -69.2 -10.7 1
## 6 (-69.6,-69.2] (-10.1,-9.68] -69.4 -9.77 1
## 7 (-69.2,-68.8] (-1.56,-1.19] -68.8 -1.39 1
## 8 (-68.8,-68.4] (-11.2,-10.8] -68.6 -10.9 1
## 9 (-68.8,-68.4] (-10.8,-10.4] -68.6 -10.6 1
## 10 (-68.8,-68.4] (-10.1,-9.68] -68.4 -10 1
## # ℹ 1,945 more rows
We are plotting fewer points now and we have less overplotting. Which helps to comment easier and be more accurate about the mosquito distribution of regions.
Around ‘Recife’ and ‘Maceio’ regions has high concentration of dots. Especially blue and purple dots are concantrated here. Which shows high number of mosquitos. Also, ‘Sao Paulo’, ‘Ribeirao Preto’ and ‘Londrina’ regions show high number of mosquitos.
Download a relevant map of Swedish counties from http://gadm.org/country and load it into R. Read your data into R and process it in such a way that different age groups are shown in different columns. Let’s call these groups Young, Adult and Senior.
Create a plot in Plotly containing three violin plots showing mean income distributions per age group. Analyze this plot and interpret your analysis in terms of income.
From the box plots, we can see that the income of young is way lower than adult and senior while the income of senior can be seen to be little bit higher than adult. This can be because in the young category they dont have much work experience so they start with a lower income and as the gain more experience, they grow and go to the adult category and therefore their income increases and then they slowly reach an saturation point as they go to the senior category where their income doesnt increase as drastically as the change from young to the adult category.
We see a positive correlation, it indicates that senior incomes tend to increase proportionally with adult and young incomes across the counties.
We can see a linear trend to the model so a linear regression would be suitable to model this dependence as it assumes a linear relationship between the independent variables (adult and young) and the dependent variable (senior).
An interesting observation that we can see from this plot for adults is that as we move from the north to the south, the income increases.
An interesting observation that we can see from this plot for adults is that as we move from the north to the south, the income increases. But we dont notice such a strong pattern like that for the young category which can be because of they have less work experience to dictate their income.
knitr::opts_chunk$set(echo = TRUE)
library(plotly)
library(dplyr)
library(readr)
mosquito_data<-read.csv("aegypti_albopictus.csv")
data_2004 <- mosquito_data %>% filter(YEAR == 2004)
data_2013 <- mosquito_data %>% filter(YEAR == 2013)
Sys.setenv('MAPBOX_TOKEN' = 'pk.eyJ1IjoibmF6YmkwNTYiLCJhIjoiY20xNm1wdnl2MGgwNTJscXhuZzNzZmh2dSJ9.juf_tlUFGCUHvY9MCSX5lw')
p_2004<-plot_mapbox(data_2004)%>%add_trace(type="scattermapbox",lat=~Y, lon=~X,
color=~VECTOR,colors = c('red', 'blue'))%>%
layout(
title = "mosquito species distribution-2004"
)
p_2013<-plot_mapbox(data_2013)%>%add_trace(type="scattermapbox",lat=~Y, lon=~X,
color=~VECTOR,colors = c('red', 'blue'))%>%
layout(
title = "mosquito species distribution-2013"
)
p_2004
p_2013
mosquito_count<- table(mosquito_data$COUNTRY_ID)
mosquito_count_country <- as.data.frame(mosquito_count)
colnames(mosquito_count_country) <- c("COUNTRY_ID", "Z")
g<-list(fitbounds="locations", visible=FALSE,projection = list(type = "equirectangular"))
p_geo<-plot_geo(mosquito_count_country)%>%add_trace(type="choropleth",
z = ~Z,locations = ~COUNTRY_ID,
colors = "Blues")%>% colorbar(title='Number of <br>Occurences') %>% layout(geo=g)
p_geo
g_equirectengular<-list(projection = list(type = "equirectangular"))
p_geo_log<-plot_geo(mosquito_count_country)%>%add_trace(type="choropleth",
z = ~log(Z),locations = ~COUNTRY_ID,
colors = "Blues") %>%
layout(geo=g_equirectengular)
p_geo_log
g_conic<-list(projection = list(type = "conic equal area"))
p_geo_conic<-plot_geo(mosquito_count_country)%>%add_trace(type="choropleth",
z = ~log(Z),locations = ~COUNTRY_ID,
colors = "Blues") %>% layout(geo=g_conic)
p_geo_conic
data_2013_brazil <- data_2013 %>% filter(COUNTRY_ID == "BRA")
X<-c(data_2013_brazil$X)
X1<-cut_interval(X,n=100)
Y<-c(data_2013_brazil$Y)
Y1<-cut_interval(Y,n=100)
new_df_x1_y1<-data.frame(X=X,Y=Y,X1=X1,Y1=Y1)
result <- new_df_x1_y1 %>%
group_by(X1, Y1) %>%
summarize(
mean_X = mean(X),
mean_Y = mean(Y),
N = n()
)
result
p_mean<-plot_mapbox(result)%>%add_trace(type="scattermapbox", mode = 'markers',lat=~mean_Y,
lon=~mean_X,size=~N,
color=~N,colors = c('red', 'blue'),
text= ~paste("meanx:", mean_X, "<br>meanY:",mean_Y, "<br>N:", N),
hoverinfo ='text')
p_mean
data = read.csv("data.csv",header = TRUE)
library(stringr)
library(tidyr)
#setwd("~/Downloads/M A S T E R S /Sem-3/Part-1/Visualization/labs/lab-3")
county_map<-jsonlite::read_json("county.json")
data$region<-str_sub(data$region,4,-1)
data$region<-str_sub(data$region,1,-8)
data <- spread(data,"age","X2016")
colnames(data) <- c("region","young","adult","senior")
data$region[4]<- "Gävleborg"
data$region[6]<- "Jämtland"
data$region[7]<- "Jönköping"
data$region[11]<- "Skåne"
data$region[13]<- "Södermanland"
data$region[15]<- "Värmland"
data$region[16]<- "Västerbotten"
data$region[17]<- "Västernorrland"
data$region[18]<- "Västmanland"
data$region[19]<- "VästraGötaland"
data$region[20]<- "Örebro"
data$region[21]<- "Östergötland"
# 2)
library(plotly)
data %>% plot_ly(y= ~young, type = "box" , name = "young") %>% add_trace(y = ~adult, type ="box", name = "adult") %>% add_trace(y = ~senior, type ="box", name = "senior") %>% layout(yaxis = list(title = 'Income'))
library(akima)
attach(data)
s=interp(young,adult,senior, duplicate = "mean")
detach(data)
plot_ly(x=~s$x, y=~s$y, z=~s$z, type="surface")%>% layout(yaxis = list(title = 'Income'))
g=list(fitbounds="locations", visible=FALSE)
plot_geo(data)%>%add_trace(type="choropleth",geojson=county_map, locations=~region,z=~adult, featureidkey="properties.NAME_1")%>%layout(geo=g)
g=list(fitbounds="locations", visible=FALSE)
plot_geo(data)%>%add_trace(type="choropleth",geojson=county_map, locations=~region,z=~young, featureidkey="properties.NAME_1")%>%layout(geo=g)
g=list(fitbounds="locations", visible=FALSE)
plot_geo(data)%>%add_trace(type="choropleth",geojson=county_map, locations=~region,z=~young, featureidkey="properties.NAME_1") %>% add_trace(type="scattergeo",lat=~58.4108, lon=~15.6214) %>%layout(geo=g)